Goto

Collaborating Authors

 state-of-the-art model



Appendix

Neural Information Processing Systems

We limit the target languages for this augmentation process to Arabic, Finnish, Japanese, Korean, Russian, Spanish, Swedish, Hebrew, Thai,Danish,French,Italian,Dutch,Polish,andPortuguese. Interestingly,justaddingthislanguage code effectively changes the outputs as shown in Table 7. We further subsample 50% of the synthetically generated questions. During inference, we first retrieve top 15 passages using mDPR, and then feed the questions andconcatenated passages intothemGEN model, withlanguage tags. The gray dots concentrated in the lower right part in the first figure represent encoded Thai embeddings.



the MCMC perspective, we could treat these (already learned) models as proposals for the approximate MH-algorithm

Neural Information Processing Systems

We thank all of the reviewers for their valuable feedback and detailed comments. That is "improvement and justification of any implicit sampler". We know that in practice, even state-of-the-art generative models yield "unrealistic" samples, hence, are biased. (Algorithm 3). Based on our theoretical analysis, we derive different losses for the discriminator (Table 1 in the paper).


A Transformer-Based Approach for Diagnosing Fault Cases in Optical Fiber Amplifiers

Schneider, Dominic, Rapp, Lutz, Ament, Christoph

arXiv.org Artificial Intelligence

--A transformer-based deep learning approach is presented that enables the diagnosis of fault cases in optical fiber amplifiers using condition-based monitoring time series data. The model, Inverse Triple-Aspect Self-Attention Transformer (ITST), uses an encoder-decoder architecture, utilizing three feature extraction paths in the encoder, feature-engineered data for the decoder and a self-attention mechanism. The results show that ITST outperforms state-of-the-art models in terms of classification accuracy, which enables predictive maintenance for optical fiber amplifiers, reducing network downtimes and maintenance costs. In present optical transmission links, optical fiber amplifiers are key components in long-haul and metro fiber optical networks. Aging of these devices can result in slowly but permanently increasing performance degradation, but also complete outage of the affected link, resulting in cost-intensive maintenance and high financial loss of income.



IAUNet: Instance-Aware U-Net

Prytula, Yaroslav, Tsiporenko, Illia, Zeynalli, Ali, Fishman, Dmytro

arXiv.org Artificial Intelligence

Instance segmentation is critical in biomedical imaging to accurately distinguish individual objects like cells, which often overlap and vary in size. Recent query-based methods, where object queries guide segmentation, have shown strong performance. While U-Net has been a go-to architecture in medical image segmentation, its potential in query-based approaches remains largely unexplored. In this work, we present IAUNet, a novel query-based U-Net architecture. The core design features a full U-Net architecture, enhanced by a novel lightweight convolutional Pixel decoder, making the model more efficient and reducing the number of parameters. Additionally, we propose a Transformer decoder that refines object-specific features across multiple scales. Finally, we introduce the 2025 Revvity Full Cell Segmentation Dataset, a unique resource with detailed annotations of overlapping cell cytoplasm in brightfield images, setting a new benchmark for biomedical instance segmentation. Experiments on multiple public datasets and our own show that IAUNet outperforms most state-of-the-art fully convolutional, transformer-based, and query-based models and cell segmentation-specific models, setting a strong baseline for cell instance segmentation tasks. Code is available at https://github.com/SlavkoPrytula/IAUNet


Apple Is Pushing AI Into More of Its Products--but Still Lacks a State-of-the-Art Model

WIRED

Apple continued its slow-and-steady approach to integrating artificial intelligence into devices like the iPhone, Mac, and Apple Watch on Monday, announcing a raft of new features and upgrades at WWDC. The company also premiered the Foundation Models framework, a way for developers to write code that taps into Apple's AI models. Among the buzzier AI announcements at the event was Live Translation, a feature that translates phone and FaceTime calls from one language to another in real time. Apple also showed off Workout Buddy, an AI-powered voice helper designed to provide words of encouragement and useful updates during exercise. "This is your second run this week," Workout Buddy told a jogging woman in a demo video.


d6288499d0083cc34e60a077b7c4b3e1-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their efforts and constructive comments, which help improve the quality of our paper. Based on the analysis of the first and second moment of the estimator presented in Theorems 5.1 and 5.2, a Chebyshev's type error bound can be easily [obtained:] We will present it as a corollary in the final version. We will add this discussion to the final version. Better figure representing MSE vs p. Thanks for the suggestion and we will revise our paper accordingly. It shows that RMSE decreases as p increases.


Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

Sun, Haoxiang, Min, Yingqian, Chen, Zhipeng, Zhao, Wayne Xin, Liu, Zheng, Wang, Zhongyuan, Fang, Lei, Wen, Ji-Rong

arXiv.org Artificial Intelligence

In recent years, the rapid development of large reasoning models has resulted in the saturation of existing benchmarks for evaluating mathematical reasoning, highlighting the urgent need for more challenging and rigorous evaluation frameworks. To address this gap, we introduce OlymMATH, a novel Olympiad-level mathematical benchmark, designed to rigorously test the complex reasoning capabilities of LLMs. OlymMATH features 200 meticulously curated problems, each manually verified and available in parallel English and Chinese versions. The problems are systematically organized into two distinct difficulty tiers: (1) AIME-level problems (easy) that establish a baseline for mathematical reasoning assessment, and (2) significantly more challenging problems (hard) designed to push the boundaries of current state-of-the-art models. In our benchmark, these problems span four core mathematical fields, each including a verifiable numerical solution to enable objective, rule-based evaluation. Empirical results underscore the significant challenge presented by OlymMATH, with state-of-the-art models including DeepSeek-R1 and OpenAI's o3-mini demonstrating notably limited accuracy on the hard subset. Furthermore, the benchmark facilitates comprehensive bilingual assessment of mathematical reasoning abilities-a critical dimension that remains largely unaddressed in mainstream mathematical reasoning benchmarks. We release the OlymMATH benchmark at the STILL project: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.